Getting a subset of a data structure
Problem
You want to do get a subset of the elements of a vector, matrix, or data frame.
Solution
To get a subset based on some conditional criterion, the subset()
function or indexing using square brackets can be used. In the examples here, both ways are shown.
# A sample vector v <- c(1,4,4,3,2,2,3) subset(v, v<3) v[v<3] # 1 2 2 # Another vector t <- c("small", "small", "large", "medium") # Remove "small" entries subset(t, t!="small") t[t!="small"] # "large" "medium"
One important difference between the two methods is that you can assign values to elements with square bracket indexing, but you cannot with subset()
.
v[v<3] <- 9 # 9 4 4 3 9 9 3 subset(v, v<3) <- 9 # Error in subset(v, v < 3) <- 9 : could not find function "subset<-"
With data frames:
# A sample data frame data <- read.table(header=T, con <- textConnection(' subject sex size 1 M 7 2 F 6 3 F 9 4 M 11 ')) close(con) subset(data, subject < 3) data[data$subject < 3, ] # subject sex size # 1 M 7 # 2 F 6 # Subset of particular rows and columns subset(data, subject < 3, select = -subject) subset(data, subject < 3, select = c(sex,size)) subset(data, subject < 3, select = sex:size) data[data$subject < 3, c("sex","size")] # sex size # M 7 # F 6 # Logical AND of two conditions subset(data, subject < 3 & sex=="M") data[data$subject < 3 & data$sex=="M", ] # subject sex size # 1 M 7 # Logical OR of two conditions subset(data, subject < 3 | sex=="M") data[data$subject < 3 | data$sex=="M", ] # subject sex size # 1 M 7 # 2 F 6 # 4 M 11 # Condition based on transformed data subset(data, log2(size)>3 ) data[log2(data$size) > 50, ] # subject sex size # 3 F 9 # 4 M 11 # Subset if elements are in another vector subset(data, subject %in% c(1,3)) data[data$subject %in% c(1,3), ] # subject sex size # 1 M 7 # 3 F 9
Notes
Also see ../Indexing into a data structure.